A Provably Correct Learning Algorithm for Latent-Variable PCFGs

نویسندگان

  • Shay B. Cohen
  • Michael Collins
چکیده

We introduce a provably correct learning algorithm for latent-variable PCFGs. The algorithm relies on two steps: first, the use of a matrix-decomposition algorithm applied to a co-occurrence matrix estimated from the parse trees in a training sample; second, the use of EM applied to a convex objective derived from the training samples in combination with the output from the matrix decomposition. Experiments on parsing and a language modeling problem show that the algorithm is efficient and effective in practice.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spectral learning of latent-variable PCFGs: algorithms and sample complexity

We introduce a spectral learning algorithm for latent-variable PCFGs (Matsuzaki et al., 2005; Petrov et al., 2006). Under a separability (singular value) condition, we prove that the method provides statistically consistent parameter estimates. Our result rests on three theorems: the first gives a tensor form of the inside-outside algorithm for PCFGs; the second shows that the required tensors ...

متن کامل

Experiments with Spectral Learning of Latent-Variable PCFGs

Latent-variable PCFGs (L-PCFGs) are a highly successful model for natural language parsing. Recent work (Cohen et al., 2012) has introduced a spectral algorithm for parameter estimation of L-PCFGs, which—unlike the EM algorithm—is guaranteed to give consistent parameter estimates (it has PAC-style guarantees of sample complexity). This paper describes experiments using the spectral algorithm. W...

متن کامل

Diversity in Spectral Learning for Natural Language Parsing

We describe an approach to create a diverse set of predictions with spectral learning of latent-variable PCFGs (L-PCFGs). Our approach works by creating multiple spectral models where noise is added to the underlying features in the training set before the estimation of each model. We describe three ways to decode with multiple models. In addition, we describe a simple variant of the spectral a...

متن کامل

Tensor Decomposition for Fast Parsing with Latent-Variable PCFGs

We describe an approach to speed-up inference with latent-variable PCFGs, which have been shown to be highly effective for natural language parsing. Our approach is based on a tensor formulation recently introduced for spectral estimation of latent-variable PCFGs coupled with a tensor decomposition algorithm well-known in the multilinear algebra literature. We also describe an error bound for t...

متن کامل

Spectral Learning of Latent-Variable PCFGs

Jeju, Republic of Korea, 8-14 July 2012. c ©2012 Association for Computational Linguistics Spectral Learning of Latent-Variable PCFGs Shay B. Cohen, Karl Stratos, Michael Collins, Dean P. Foster, and Lyle Ungar Dept. of Computer Science, Columbia University Dept. of Statistics/Dept. of Computer and Information Science, University of Pennsylvania {scohen,stratos,mcollins}@cs.columbia.edu, foster...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014